Thoughts on this week

This week I should prioritize reading some papers, especially the methods ones. Ie synthseg, the gamlss paper, the original braincharts paper. I also want to get a better understanding of gamlss. So I want to do at least one of the exercises attached to that video I watched. I also want to get the minor tasks from last week finished (like the lowQC scans fsleyes - especially for the MPR ones now that they are separate). I would also like to try to run a model of gamlss on the LBCC data - after adding the SLIP data to it. Perhaps try to run it for different time periods, and add the GA as a “fixed” = categorical (?) variable. To do this well and sustainably, I should edit Jenna’s code. #Aaron mtg 02/13


This is the script from Jenna’s github that would be a helpful place to start to build gamlss models. There should also be Jakob’s scripts that are more complicated but provide more flexibility (this is my understanding) in choosing a model. https://github.com/jmschabdach/mpr_analysis/blob/develop/r/build_your_own_growth_chart.R


Notes on Jenna Code - FreeSurfer model includes SurfaceHoles variable in model, not applicable in SynthSeg output. - logAge: numeric type with log(post conception age in days, base=10). In (1), we use a conversion factor of 325.25 days/year and a post conception offset of 280 days if post conception age is not available. #Setup

Play - SLIP

Load Data

Data Jenna sent is in 3 folders by release date There are 3 files in each folder - qc scores csv % synthseg volumes csv – 2155 unique subject_ids, and 11267 unique data points (subject + age at scan?) - participants tsv – 2173 total data

Put all the data together from three releases and merge the qc scores, volumes, and participant data Save this full dataset ###Load all csv data and merge

participants <- load_data(names = c(paste0("data",1:3)), paste0(folders,"participants.tsv")) %>% bind_rows(.)
qc_scores <- load_data(names = c(paste0("data",1:3)), paste0(folders,"synthseg+_qc_scores.csv")) %>% bind_rows(.)
volumes <- load_data(names = c(paste0("data",1:3)), paste0(folders,"synthseg+_volumes.csv")) %>% bind_rows(.)
colnames(qc_scores) <- c(paste0(names(qc_scores[1:8])),paste0(names(qc_scores[9:length(names(qc_scores))]),"_qc"))
qc_manual <- load_data(names = "qc", manual_qc_file) %>% bind_rows(.)


#merge
full_data <- merge(qc_scores, volumes, by = intersect(names(qc_scores), names(volumes)))
full_data <- merge(full_data, participants, by = intersect(names(full_data), names(participants)))
write.csv(full_data, file = "/Users/ekafadar/Documents/Grad_School/BGDLab/SLIP_combined_011923.csv", quote = F, row.names = F)

saved SLIP_combined_011923.csv with 11267 rows and 137 variables

###Load full data adjust variables: log10 age, get minQC column

qc_manual <- load_data(names = "qc", manual_qc_file) %>% bind_rows(.)
full_data <- read.csv("/Users/ekafadar/Documents/Grad_School/BGDLab/SLIP_data/SLIP_combined_011923.csv") 
full_data$logAge <- log10(full_data$adjusted_age_in_days)
full_data <- full_data %>% rowwise() %>% 
  mutate(minQC = min(c_across(contains("qc"))))

###Split data by scan type, ageBin

full_data$scan_type <- ifelse(grepl("MPR", full_data$scan_id), "MPR", ifelse((!grepl("MPR", full_data$scan_id) & grepl("T1w", full_data$scan_id)), "T1w_other", ifelse(grepl("T2w", full_data$scan_id), "T2w", ifelse(grepl("FLAIR", full_data$scan_id), "flair", NA))))
full_data$ageBin <- cut(full_data$age_at_scan, breaks = c(-Inf, 30, 365, 1095, 2190, 4380, Inf), labels = c("Newborn", "Infant", "Toddler", "Preschool", "School_Age", "Adolescent"), include.lowest = TRUE)
#df_MPR <- full_data[grepl("MPR", full_data$scan_id),] %>% distinct(subject_id, .keep_all = T)#947

#df_T1w <- full_data[!grepl("MPR", full_data$scan_id) & grepl("T1w", full_data$scan_id),] %>% distinct(subject_id, .keep_all = T)#1743

#df_T2w <- full_data[grepl("T2w", full_data$scan_id),] %>% distinct(subject_id, .keep_all = T)#1021

#table(grepl("FLAIR",full_data$scan_id))#891
table(full_data$scan_type)
## 
##     flair       MPR T1w_other       T2w 
##       891      1229      6519      2628

1229 MPR scans, 947 unique subject ids 6519 T1w non-MPR scans, 1743 unique subject ids 2628 T2w scans, 1021 unique subject ids 891 FLAIR scans -All 11267 scans accounted for above.

###Separate df w/ GA Use scans with MPR. df_MPR !!! Noticed on 2/6/24 that the cut-offs were inclusive, so changed them to 31.9 and 36.9, this will change the #s of subjects for the categories down the line. So should re-run things!!

GA_data <- subset(filter(full_data, scan_type == "MPR"), !is.na(gestational_age)) %>% distinct(subject_id, .keep_all = T)#358
GA_data$log_GA <- log10(GA_data$gestational_age)
#Split in GA bins
# Determine the cut points based on quantiles
cut_points <- quantile(GA_data$gestational_age, probs = c(1/3, 2/3)) # cut points are 39 and 40. So this will NOT work.

cut_points <- quantile(GA_data$log_GA, probs = c(1/3, 2/3)) # cut points are 1.59 and 1.6. So this will NOT work.

# Create a new column indicating cut points: specifying by weeks.
GA_data$preterm <- cut(GA_data$gestational_age, breaks = c(-Inf, 31.9, 36.9, Inf), labels = c("VPM", "LPM", "Term"), include.lowest = TRUE)
table(GA_data$preterm)
## 
##  VPM  LPM Term 
##    8   42  308
table(GA_data$sex)
## 
## Female   Male 
##    164    194

Gestational Age


Boxplot of preterm categories Within the MPR data

# Create a boxplot with ggplot2
ggp <- GA_boxplot(GA_data, gestational_age)

ggp + scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.3) +
    labs(x = "Preterm Categories", y = "Gestational Age") +
    theme_minimal()

hist(GA_data$gestational_age)

hist(GA_data$adjusted_age_in_days)

## QC Scores ###General QC notes synthseg QC scores is like a predicted dice coef, based on trained models, only trained for some values. this QC might not be good enough, might have to go back and do some QC Synthseg is hierarchical, ie finds cortex and then finds regions between them Confirm with Jakob, 0.65 across every single one (all should be above 0.65 within a sample) Might need to do some sensitivity analyses.

For a subset of these there should be manual QC done. What about the relationship between the synthseg QC values vs manual QC values in that subset? Might be useful in terms of convincing synthseg automated QC could be sufficient.

After the centile scores, see whether the centile score is related to QC and could include the QC score in the model (down the line, to control for it if it is affecting the centile score)

  • Make a plot of the minimum QC score for each subject distribution. Shape of the distribution. Look at the distribution for the cut-off (hopefully around 0.65).\(\checkmark\)
  • QC score against age, and against gestational age (any bias here?).\(\checkmark\)
  • Plot GA availability with recency of scan (more likely to have GA?). I don’t think I have data for when the scan took place

Expectation: A higher proportion of data would have GA if its for people under < 5, with only more recent scans Look at QC scores for variables which include QC scores: * general WM * general GM * general CSF * cerebellum * brainstem * thalamus * putamen.pallidum * hippocampus.amygdala


###Correlations of QC with age, GA Both all data and data with GA have similar distribution of QC. There is a smaller peak at the very low end of minimum QC but bulk of scans have high QC < 0.6. Lowest QC score is from the youngest age at scan, but higher scores is evenly distributed. Significant positive corr 0.33, p-value < 2.2e-16 For GA & QC score. Significant positive corr 0.15, p-value < 2.2e-16. Still sort of evenly distributed though (per plot).

hist(GA_data$minQC)

df_MPR <- filter(full_data, scan_type == "MPR") %>% distinct(subject_id, .keep_all = T)
hist(df_MPR$minQC)

#Look at relationship between age at scan & QC score
plot(df_MPR$age_at_scan, df_MPR$minQC)

cor.test(df_MPR$age_at_scan, df_MPR$minQC)
## 
##  Pearson's product-moment correlation
## 
## data:  df_MPR$age_at_scan and df_MPR$minQC
## t = 9.9316, df = 945, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2485926 0.3640051
## sample estimates:
##      cor 
## 0.307429
#GA & QC score
plot(GA_data$gestational_age, GA_data$minQC)

cor.test(GA_data$gestational_age, GA_data$minQC)
## 
##  Pearson's product-moment correlation
## 
## data:  GA_data$gestational_age and GA_data$minQC
## t = 2.3127, df = 356, p-value = 0.02131
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01824414 0.22250907
## sample estimates:
##       cor 
## 0.1216646
#GA & age at scan
plot(GA_data$gestational_age, GA_data$age_at_scan)

cor.test(GA_data$gestational_age, GA_data$age_at_scan)
## 
##  Pearson's product-moment correlation
## 
## data:  GA_data$gestational_age and GA_data$age_at_scan
## t = 1.0236, df = 356, p-value = 0.3067
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04975865  0.15694051
## sample estimates:
##        cor 
## 0.05417122
#Min QC < 0.650 / 0.60
GA_data_qc <- GA_data %>%
  filter(minQC >= 0.60)
table(GA_data$preterm)
## 
##  VPM  LPM Term 
##    8   42  308
table(GA_data_qc$preterm)
## 
##  VPM  LPM Term 
##    7   36  293
table(GA_data$gestational_age)
## 
##  27  28  30  31  32  33  34  35  36  37  38  39  40  41  42 
##   2   3   1   2   3   3  10  10  16  29  40  76 131  26   6
table(GA_data_qc$gestational_age)
## 
##  27  28  30  31  32  33  34  35  36  37  38  39  40  41  42 
##   1   3   1   2   3   3   8   7  15  25  38  71 129  24   6

###Is minQC Driven by any specific region? QC by different scan type, age bins - seem to be that CSF and GM have smaller QC values than other regions

df <- full_data %>% group_by(scan_type) %>% distinct(subject_id, .keep_all = T)
p_scan <- minQC_by_region_dens(df, title = "QC Distribution by Region, By Scan Type") + facet_wrap(~scan_type, nrow = 2)
p_scan

table(df$scan_type)
## 
##     flair       MPR T1w_other       T2w 
##       569       947      1743      1021
#full data - MPR only
df_MPR <- full_data %>% group_by(scan_type) %>% distinct(subject_id, .keep_all = T) %>% filter(scan_type == "MPR")
#(minQC_by_region_hist(df_MPR, title = "QC Distribution by Region, MPR Only"))
(minQC_by_region_dens(df_MPR, title = "QC Distribution by Region, MPR Only"))

#GA only
#(minQC_by_region_hist(GA_data, title = "QC Distribution by Region, MPR Only, w/ GA"))
(minQC_by_region_dens(GA_data, title = "QC Distribution by Region, MPR Only, w/ GA"))

#Get QC dist for each Age Bin.
p_age <- minQC_by_region_dens(df_MPR, title = "QC Distribution by Region, MPR Only") + facet_wrap(~ageBin, nrow = 2)
p_age

table(df_MPR$ageBin)
## 
##    Newborn     Infant    Toddler  Preschool School_Age Adolescent 
##         15         77         99        198        248        310
#By scan X age
p_scanAge <- minQC_by_region_dens(df, title = "QC Distribution by Region, Age x Scan") + facet_wrap(~ageBin + scan_type, nrow = 3)
p_scanAge


###Manual QC and Auto QC similarity? SLIP paper used average QC >= 1.0 Odds Ratio? For the full dataset

df <- merge(full_data, qc_manual,by = c("subject_id", "session_id")) %>% group_by(scan_type) %>% distinct(subject_id, .keep_all = T) #273 of them
df_MPR <- df %>% filter(scan_type == "MPR")
plot(df_MPR$minQC, df_MPR$rawdata_image_grade)

cor.test(df_MPR$minQC, df_MPR$rawdata_image_grade)
## 
##  Pearson's product-moment correlation
## 
## data:  df_MPR$minQC and df_MPR$rawdata_image_grade
## t = 2.2877, df = 271, p-value = 0.02292
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01924455 0.25224240
## sample estimates:
##       cor 
## 0.1376472
sum(df$rawdata_image_grade >= 1) #222/273 using manual
## [1] 380
sum(df$minQC >= 0.65)#191/273 using SS auto
## [1] 227
sum(df$minQC >= 0.60)#266/273 using SS auto
## [1] 369
#if SS cut off were 0.65
a <- sum(df$minQC >=0.65 & df$rawdata_image_grade >= 1)
b <- sum(df$minQC <0.65 & df$rawdata_image_grade >= 1)
c <- sum(df$minQC >=0.65 & df$rawdata_image_grade < 1)
d <- sum(df$minQC <0.65 & df$rawdata_image_grade < 1)
OR <- (a*d)/(b*c)
df_QC_65 <- data.frame(SS_pass = c(a, c), SS_fail = c(b,d), row.names = c("manual_pass", "manual_fail"))
mosaicplot(df_QC_65, shade = F)

#if SS cut off were 0.6
a <- sum(df$minQC >=0.6 & df$rawdata_image_grade >= 1)
b <- sum(df$minQC <0.6 & df$rawdata_image_grade >= 1)
c <- sum(df$minQC >=0.6 & df$rawdata_image_grade < 1)
d <- sum(df$minQC <0.6 & df$rawdata_image_grade < 1)
OR <- (a*d)/(b*c)
df_QC_60 <- data.frame(SS_pass = c(a, c), SS_fail = c(b,d), row.names = c("manual_pass", "manual_fail"))
mosaicplot(df_QC_60, shade = F)

#plot overlap, by scan type
hist(df_MPR$minQC)

ggplot(df, aes(x=minQC, fill = as.factor(rawdata_image_grade))) + geom_histogram(bins = 20) + scale_x_continuous(breaks = pretty(df$minQC, n = 10)) + theme_minimal() + ggtitle("All data with QC, n = 472") + facet_wrap(~scan_type, nrow = 2)

#plot overlap, by age bin
ggplot(df, aes(x=minQC, fill = as.factor(rawdata_image_grade))) + geom_histogram(bins = 20) + scale_x_continuous(breaks = pretty(df$minQC, n = 10)) + theme_minimal() + ggtitle("All data with QC, n = 472") + facet_wrap(~ageBin, nrow = 2)

#plot overlap, by ageXscan
ggplot(df, aes(x=minQC, fill = as.factor(rawdata_image_grade))) + geom_histogram(bins = 20) + scale_x_continuous(breaks = pretty(df$minQC, n = 10)) + theme_minimal() + ggtitle("All data with QC, n = 472") + facet_wrap(~ageBin+scan_type, nrow = 3)

For the GA data odds ratio of being high QC under SS when high QC under manual is 2.43 (with SS value at 0.65) 2.35 with SS value at 0.6 80/101 pass manual QC, 56/101 pass SS auto QC (at 0.65)

df <- merge(GA_data, qc_manual,by = c("subject_id", "session_id"))
plot(df$minQC, df$rawdata_image_grade)

cor.test(df$minQC, df$rawdata_image_grade)
## 
##  Pearson's product-moment correlation
## 
## data:  df$minQC and df$rawdata_image_grade
## t = 1.4791, df = 64, p-value = 0.144
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.06299536  0.40597765
## sample estimates:
##       cor 
## 0.1818095
sum(df$rawdata_image_grade >= 1) #80/101 using manual
## [1] 52
sum(df$minQC >= 0.65)# 56/101 using SS auto
## [1] 49
#if SS cut off were 0.65
a <- sum(df$minQC >=0.65 & df$rawdata_image_grade >= 1)
b <- sum(df$minQC <0.65 & df$rawdata_image_grade >= 1)
c <- sum(df$minQC >=0.65 & df$rawdata_image_grade < 1)
d <- sum(df$minQC <0.65 & df$rawdata_image_grade < 1)
OR <- (a*d)/(b*c)
df_QC_65 <- data.frame(SS_pass = c(a, c), SS_fail = c(b,d), row.names = c("manual_pass", "manual_fail"))
mosaicplot(df_QC_65, shade = F)

#if SS cut off were 0.6
a <- sum(df$minQC >=0.6 & df$rawdata_image_grade >= 1)
b <- sum(df$minQC <0.6 & df$rawdata_image_grade >= 1)
c <- sum(df$minQC >=0.6 & df$rawdata_image_grade < 1)
d <- sum(df$minQC <0.6 & df$rawdata_image_grade < 1)
OR <- (a*d)/(b*c)
df_QC_60 <- data.frame(SS_pass = c(a, c), SS_fail = c(b,d), row.names = c("manual_pass", "manual_fail"))
mosaicplot(df_QC_60, shade = F)

#plot overlap
hist(df$minQC)

ggplot(df, aes(x=minQC, fill = as.factor(rawdata_image_grade))) + geom_histogram(bins = 20) + scale_x_continuous(breaks = pretty(df$minQC, n = 10)) + theme_minimal() + ggtitle("GA data with QC, n = 101")

*** Logistic regression to predict manual QC value from SS auto QC - minQC IS a sig predictor ( p = 0.119, beta est = 4.5) of “pass” status in manual QC — using only the MPR scans

df_MPR <- full_data %>% group_by(scan_type) %>% distinct(subject_id, .keep_all = T) %>% filter(scan_type == "MPR") 
df <- merge(df_MPR, qc_manual,by = c("subject_id", "session_id")) %>% mutate(manual_QC_pass = rawdata_image_grade >= 1)
df$manual_QC_pass<- factor(df$manual_QC_pass)
logit <- glm(manual_QC_pass ~ minQC, data = df, family = "binomial")
summary(logit)
## 
## Call:
## glm(formula = manual_QC_pass ~ minQC, family = "binomial", data = df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   -1.497      1.191  -1.256   0.2091  
## minQC          4.507      1.792   2.514   0.0119 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 262.94  on 272  degrees of freedom
## Residual deviance: 255.18  on 271  degrees of freedom
## AIC: 259.18
## 
## Number of Fisher Scoring iterations: 4

Notes after impromptu meeting with Jenna on 01/31 on QC stuff GM/CSF/WM – could just be sums? Is QC value an average of these sums? Doesn’t seem likely given the distributions. There is specific QC ratings for the manual ratings in the respublika folder for the SLIP radiology paper. This could be useful to look at specific ratings of the images. The specific images used to rate are also in this folder I think


###Looking at Individual Scans Take a look at individual scans that have - borderline auto QC 0.6/0.65 - mismatched auto & manual QC - mid-low auto QC 0.2-0.4 Look at about ~2-3 scans from each category in fsl viewer on Respublika Screenshot them for google slides. Choosing the scans to be looked at:

borderline_autoQC <- filter(GA_data, minQC < 0.65 & minQC >= 0.55)
df <- merge(GA_data, qc_manual,by = c("subject_id", "session_id"))
mismatch_QC_mnPS <- filter(df, rawdata_image_grade >= 1 & minQC < 0.65)
mismatch_QC_ssPS <- filter(df, rawdata_image_grade < 1 & minQC >= 0.65)
midlow_autoQC <- filter(GA_data, minQC < 0.4 & minQC > 0.2)
set.seed(42)
s1 <- borderline_autoQC[sample(nrow(borderline_autoQC), 1, replace = FALSE), ]
s2 <-mismatch_QC_mnPS[sample(nrow(mismatch_QC_mnPS), 1, replace = FALSE), ]
s3 <-mismatch_QC_ssPS[sample(nrow(mismatch_QC_ssPS), 1, replace = FALSE), ]
s4 <-midlow_autoQC[sample(nrow(midlow_autoQC), 1, replace = FALSE), ]
full_data[which(full_data$minQC == max(full_data$minQC)),]
## # A tibble: 1 × 141
## # Rowwise: 
##   study_id  subject_id  session_id sex   adjusted_age_in_days scan_id processing
##   <chr>     <chr>       <chr>      <chr>                <int> <chr>   <chr>     
## 1 SLIP 2022 sub-HM14RH… ses-20105… Fema…                 6286 sub-HM… SynthSeg  
## # ℹ 134 more variables: subject <chr>, general.white.matter_qc <dbl>,
## #   general.grey.matter_qc <dbl>, general.csf_qc <dbl>, cerebellum_qc <dbl>,
## #   brainstem_qc <dbl>, thalamus_qc <dbl>, putamen.pallidum_qc <dbl>,
## #   hippocampus.amygdala_qc <dbl>, total.intracranial <dbl>,
## #   left.cerebral.white.matter <dbl>, left.cerebral.cortex <dbl>,
## #   left.lateral.ventricle <dbl>, left.inferior.lateral.ventricle <dbl>,
## #   left.cerebellum.white.matter <dbl>, left.cerebellum.cortex <dbl>, …
full_data[which(full_data$minQC > 0.75 & full_data$adjusted_age_in_days < 1500),]
## # A tibble: 1 × 141
## # Rowwise: 
##   study_id   subject_id session_id sex   adjusted_age_in_days scan_id processing
##   <chr>      <chr>      <chr>      <chr>                <int> <chr>   <chr>     
## 1 SLIP 2023… sub-HM18O… ses-32572… Male                  1399 sub-HM… SynthSeg  
## # ℹ 134 more variables: subject <chr>, general.white.matter_qc <dbl>,
## #   general.grey.matter_qc <dbl>, general.csf_qc <dbl>, cerebellum_qc <dbl>,
## #   brainstem_qc <dbl>, thalamus_qc <dbl>, putamen.pallidum_qc <dbl>,
## #   hippocampus.amygdala_qc <dbl>, total.intracranial <dbl>,
## #   left.cerebral.white.matter <dbl>, left.cerebral.cortex <dbl>,
## #   left.lateral.ventricle <dbl>, left.inferior.lateral.ventricle <dbl>,
## #   left.cerebellum.white.matter <dbl>, left.cerebellum.cortex <dbl>, …
s4
## # A tibble: 1 × 143
## # Rowwise: 
##   study_id   subject_id session_id sex   adjusted_age_in_days scan_id processing
##   <chr>      <chr>      <chr>      <chr>                <int> <chr>   <chr>     
## 1 SLIP 2023… sub-HM37C… ses-21558… Fema…                  105 sub-HM… SynthSeg  
## # ℹ 136 more variables: subject <chr>, general.white.matter_qc <dbl>,
## #   general.grey.matter_qc <dbl>, general.csf_qc <dbl>, cerebellum_qc <dbl>,
## #   brainstem_qc <dbl>, thalamus_qc <dbl>, putamen.pallidum_qc <dbl>,
## #   hippocampus.amygdala_qc <dbl>, total.intracranial <dbl>,
## #   left.cerebral.white.matter <dbl>, left.cerebral.cortex <dbl>,
## #   left.lateral.ventricle <dbl>, left.inferior.lateral.ventricle <dbl>,
## #   left.cerebellum.white.matter <dbl>, left.cerebellum.cortex <dbl>, …

Max Min QC 0.7655 slip 2023-03 sub-HM14RHXMW_ses-201050308439procId006006ageDays_acq-MPR_run-001_T1w \(\checkmark\) adj age 6286 days

High QC 0.7531 age 1399 days slip 2022 sub-HM18OTNFZ_ses-325726739876procId001362ageDays_acq-TSE_run-001_T1w \(\checkmark\)

Borderline slip 2023-03 sub-HM36EP5TB_ses-297207596745procId001242ageDays_acq-TSE_run-002_T1w \(\checkmark\) minQC 0.6318 adj age 1281 days

slip 2023-02 sub-HM34S3Z1_ses-95046832730procId000534ageDays_run-001_T2w minQC 0.6445 adj age 573 days

slip 2023-02 sub-HM1WCY4QA_ses-186586445493procId003011ageDays_run-002_FLAIR minQC 0.6284 adj age 3051 days

Midlow 23-03 age 1065days minQC 0.3952 sub-HM1WEGOWD_ses-320995680774procId001027ageDays_run-003_T2w ^ !!! I can’t figure out how to display T2 images on fsleyes properly …

23-02 age 5531days minQC 0.2553 sub-HM25RQYI4_ses-222549305714procId005496ageDays_run-004_T2w

23-02 age 1406days minQC 0.3752 sub-HM2VQDK5U_ses-61776394344procId001366ageDays_run-003_T1w

Growth Charts

Here I will fit a growth chart curve for the bins of data with GA. Going off of Jenna’s code.

Get GA data that passes QC standards sex as factor logAge, base 10 (post-conception offset of 280 days if post-conception age not available)

Ran gamlss on term QC corrected sample with logAge and sex. The centile curves looks very zig-zag? Also geom_point layer cannot be computed. Keeps giving error: Error in geom_point(): ! Problem while computing aesthetics. ℹ Error occurred in the 1st layer. Caused by error in new_tibble(): ! names must not be NULL.

###Running the Model

GA_qc_manual <- merge(GA_data, qc_manual, by = c("subject_id", "session_id")) %>% filter(rawdata_image_grade >= 1) %>% mutate(sex = as.factor(sex))
levels(GA_qc_manual$sex) <- c("F", "M") #male = 0, female = 1
GA_qc_SS <- GA_data %>% filter(minQC >= 0.65) %>% mutate(sex = as.factor(sex))
levels(GA_qc_SS$sex) <- c("F", "M") #male = 0, female = 1

# Build the growth chart model
p <- "TCV" # specify the phenotype
#Data split by preterm
GA_qc_SS_vpm <- GA_qc_SS %>% filter(preterm == "VPM")
GA_qc_SS_lpm <- GA_qc_SS %>% filter(preterm == "LPM")
dfSSterm <- GA_qc_SS %>% filter(preterm == "Term") %>% select(c("sex", "logAge", "TCV")) 

# No SurfaceHoles in the SynthSeg (SS) model
#The df should have no NAs. So create df for the specific variables needed. Or remove all the variables with NAs.
formulaSs <- as.formula(paste0(p, "~fp(logAge, npoly=3) + sex - 1"))
growthChartModelSs <-gamlss(formula = formulaSs,
                            sigma.formula = formulaSs,
                            nu.formula = as.formula(paste0(p, "~1")),
                            family = GG,
                            data = dfSSterm,
                            control = gamlss.control(n.cyc = 200),  # See (2)
                            trace = F)
## GAMLSS-RS iteration 1: Global Deviance = 5846.346 
## GAMLSS-RS iteration 2: Global Deviance = 5844.827 
## GAMLSS-RS iteration 3: Global Deviance = 5844.532 
## GAMLSS-RS iteration 4: Global Deviance = 5844.421 
## GAMLSS-RS iteration 5: Global Deviance = 5844.379 
## GAMLSS-RS iteration 6: Global Deviance = 5844.363 
## GAMLSS-RS iteration 7: Global Deviance = 5844.356 
## GAMLSS-RS iteration 8: Global Deviance = 5844.354 
## GAMLSS-RS iteration 9: Global Deviance = 5844.353

###Predicting Centiles

# Predict the median centile of each model
medianCentileSs <- predictCentilesForAgeRange(growthChartModelSs, dfSSterm$logAge)
## new prediction 
## new prediction 
## new prediction 
## new prediction
# Calculate the age at peak (median) phenotype value
ageAtPeakSs <- 10^(sort(dfSSterm$logAge)[which.max(medianCentileSs)])

# Predict a set of centiles for each model
centileCurvesSs <- c()
desiredCentiles <- c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95)
for (i in c(1:length(desiredCentiles))){
  centileCurvesSs[[i]] <- predictCentilesForAgeRange(growthChartModelSs, dfSSterm$logAge, 
                                                     cent=desiredCentiles[[i]])}
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction

###Plotting the Curves

# Set up a list of tick marks to use on log(post-conception age) x-axes
tickMarks <- c()
for (year in c(0, 1, 2, 5, 10, 20)){ # years
  tickMarks <- append(tickMarks, log(year*365.25 + 280, base=10)) #currently has the standard 280 adjustment, but is this reasonable? Perhaps for the scale yes.
}
tickLabels <- c("Birth", "1", "2", "5", "10", "20")


# Plot the original data and the set of centile curves on a figure
plotSs <- ggplot() +
  #geom_point(aes(x=dfSSterm$logAge, dfSSterm[, p]), alpha=0.5) +
  geom_line(aes(x=sort(dfSSterm$logAge), y=centileCurvesSs[[1]]), alpha=0.4) +
  #geom_line(aes(x=dfSSterm$logAge, y=centileCurvesSs[[2]]), alpha=0.6) +
  #geom_line(aes(x=dfSSterm$logAge, y=centileCurvesSs[[3]]), alpha=0.8) +
  #geom_line(aes(x=dfSSterm$logAge, y=centileCurvesSs[[4]])) +
  #geom_line(aes(x=dfSSterm$logAge, y=centileCurvesSs[[5]]), alpha=0.8) +
  #geom_line(aes(x=dfSSterm$logAge, y=centileCurvesSs[[6]]), alpha=0.6) +
  #geom_line(aes(x=dfSSterm$logAge, y=centileCurvesSs[[7]]), alpha=0.4) +
  scale_x_continuous(breaks=tickMarks, labels=tickLabels, 
                     limits=c(tickMarks[[1]], max(dfSSterm$logAge))) +
  labs(title=paste0("Sample Growth Chart for ", p)) + 
  xlab("Age at scan (log(years))") +
  ylab(paste0(p, " Centile")) + 
  theme(axis.line = element_line(colour = "black"),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        text = element_text(size = 18))

print(plotSs)
## Warning: Removed 6 rows containing missing values (`geom_line()`).

# Calculate the closest centile to each subject's phenotype value
phenoCentilesSs <- calculatePhenotypeCentile(growthChartModelSs, dfSSterm[[p]], dfSSterm$logAge, dfSSterm$sex)
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction 
## new prediction
regionsSs <- rep(p, length(phenoCentilesSs))
idxesSs <- c(1:length(phenoCentilesSs))

# Plot the centiles of the phenotypes in a violin plot
dfSsViolin <- data.frame(idxesSs, regionsSs, phenoCentilesSs)
plotPhenoCentSs <- ggplot(data=dfSsViolin, aes(regionsSs, phenoCentilesSs)) +
  geom_violin(color="gray", fill="gray", alpha=0.35) +
  geom_jitter(height = 0, width=0.15, alpha=0.65) +
  labs(title=paste0("Centiles (Phenotype = ", p,")")) +
  xlab("SynthSeg") +
  ylab("Centile Value") +
  theme(axis.line = element_line(colour = "black"),
        # panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        text = element_text(size = 18))

print(plotPhenoCentSs)

plot(dfSSterm$logAge, dfSSterm$TCV)

#Play - LBCC Got LBCC data from Lena

##Load Data

lbcc <- load_data(names = "lbcc", lbcc_ga_file) %>% bind_rows(.) %>%  select(-1, -2) #drop first two columns, repeat of row numbers
length(unique(lbcc$participant))#15779 unique participant IDs
## [1] 15779

##Inspect Data Distribution ~10K from ABCD (over 50% of data points) Harvard fetal and dHCP data - aren’t born yet lol. DCHS study might not be appropriate, it is a dataset of southafrican moms & babies. Delete it! CONTE & CHILD & TEBC do not have regional info (but might have for subcortical volumes) !!Add the SLIP scans to this dataset.

table(lbcc$study)
## 
##          ABCD         CHILD         CONTE          DCHS          dHCP 
##         10583            68          1813           137           487 
##    dHCP-fetal     FinnBrain Harvard_fetal       NIH-IRP          PING 
##           265           273           122          1224           538 
##           PNC          SLIP          TEBC           UKB 
##           247          1709           252          1277
sum(is.na(lbcc$PCW_at_birth)) #603
## [1] 603
sum(is.na(lbcc$PCW_at_birth_recoded))#338
## [1] 338
range(na.omit(lbcc$PCW_at_birth))#[2 50]
## [1]  2 50
boxplot(lbcc$PCW_at_birth)

boxplot(lbcc$PCW_at_scan)

hist(lbcc$PCW_at_scan)

#Preterm categories (broad)
# Create a new column indicating cut points: specifying by weeks.
lbcc$preterm <- cut(lbcc$PCW_at_birth, breaks = c(-Inf, 31.9, 36.9, Inf), labels = c("VPM", "LPM", "Term"), include.lowest = TRUE)
table(lbcc$preterm)
## 
##   VPM   LPM  Term 
##   555  2793 15044
table(lbcc[match(unique(lbcc$participant), lbcc$participant),"preterm"])
## 
##   VPM   LPM  Term 
##   461  2069 12715